target area
ColorVisualIllusions: AStatistics-based ComputationalModel
However,neitherthedata nor the tools existed in the past to extensively support these explanations. The era of big data opens a new opportunity to study input-driven approaches. We introduce atool that computes the likelihood ofpatches, given alarge dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.
TRACE: Textual Reasoning for Affordance Coordinate Extraction
Park, Sangyun, Kim, Jin, Cui, Yuchen, Brown, Matthew S.
Vision-Language Models (VLMs) struggle to translate high-level instructions into the precise spatial affordances required for robotic manipulation. While visual Chain-of-Thought (CoT) methods exist, they are often computationally intensive. In this work, we introduce TRACE (Textual Reasoning for Affordance Coordinate Extraction), a novel methodology that integrates a textual Chain of Reasoning (CoR) into the affordance prediction process. We use this methodology to create the TRACE dataset, a large-scale collection created via an autonomous pipeline that pairs instructions with explicit textual rationales. By fine-tuning a VLM on this data, our model learns to externalize its spatial reasoning before acting. Our experiments show that our TRACE-tuned model achieves state-of-the-art performance, reaching 48.1% accuracy on the primary Where2Place (W2P) benchmark (a 9.6% relative improvement) and 55.0% on the more challenging W2P(h) subset. Crucially, an ablation study demonstrates that performance scales directly with the amount of reasoning data used, confirming the CoR's effectiveness. Furthermore, analysis of the model's attention maps reveals an interpretable reasoning process where focus shifts dynamically across reasoning steps. This work shows that training VLMs to generate a textual CoR is an effective and robust strategy for enhancing the precision, reliability, and interpretability of VLM-based robot control. Our dataset and code are available at https://github.com/jink-ucla/TRACE
- North America > United States > California > Los Angeles County > Los Angeles (0.29)
- Asia > South Korea > Seoul > Seoul (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Color Visual Illusions: A Statistics-based Computational Model
The era of big data opens a new opportunity to study input-driven approaches. We introduce a tool that computes the likelihood of patches, given a large dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.
- Asia > Middle East > Israel (0.04)
- North America > Canada (0.04)
Cooperative Target Detection with AUVs: A Dual-Timescale Hierarchical MARDL Approach
Xueyao, Zhang, Bo, Yang, Zhiwen, Yu, Xuelin, Cao, Alexandropoulos, George C., Debbah, Merouane, Yuen, Chau
Autonomous Underwater Vehicles (AUVs) have shown great potential for cooperative detection and reconnaissance. However, collaborative AUV communications introduce risks of exposure. In adversarial environments, achieving efficient collaboration while ensuring covert operations becomes a key challenge for underwater cooperative missions. In this paper, we propose a novel dual time-scale Hierarchical Multi-Agent Proximal Policy Optimization (H-MAPPO) framework. The high-level component determines the individuals participating in the task based on a central AUV, while the low-level component reduces exposure probabilities through power and trajectory control by the participating AUVs. Simulation results show that the proposed framework achieves rapid convergence, outperforms benchmark algorithms in terms of performance, and maximizes long-term cooperative efficiency while ensuring covert operations.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- (3 more...)
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Zhang, Miaosen, Xu, Ziqiang, Zhu, Jialiang, Dai, Qi, Qiu, Kai, Yang, Yifan, Luo, Chong, Chen, Tianyi, Wagle, Justin, Franklin, Tim, Guo, Baining
With the development of multimodal reasoning models, Computer Use Agents (CUAs), akin to Jarvis from \textit{"Iron Man"}, are becoming a reality. GUI grounding is a core component for CUAs to execute actual actions, similar to mechanical control in robotics, and it directly leads to the success or failure of the system. It determines actions such as clicking and typing, as well as related parameters like the coordinates for clicks. Current end-to-end grounding models still achieve less than 65\% accuracy on challenging benchmarks like ScreenSpot-pro and UI-Vision, indicating they are far from being ready for deployment. % , as a single misclick can result in unacceptable consequences. In this work, we conduct an empirical study on the training of grounding models, examining details from data collection to model training. Ultimately, we developed the \textbf{Phi-Ground} model family, which achieves state-of-the-art performance across all five grounding benchmarks for models under $10B$ parameters in agent settings. In the end-to-end model setting, our model still achieves SOTA results with scores of \textit{\textbf{43.2}} on ScreenSpot-pro and \textit{\textbf{27.2}} on UI-Vision. We believe that the various details discussed in this paper, along with our successes and failures, not only clarify the construction of grounding models but also benefit other perception tasks. Project homepage: \href{https://zhangmiaosen2000.github.io/Phi-Ground/}{https://zhangmiaosen2000.github.io/Phi-Ground/}
- Information Technology > Software (0.45)
- Information Technology > Security & Privacy (0.45)
Decentralized Consensus Inference-based Hierarchical Reinforcement Learning for Multi-Constrained UAV Pursuit-Evasion Game
Yuming, Xiang, Sizhao, Li, Rongpeng, Li, Zhifeng, Zhao, Honggang, Zhang
--Multiple quadrotor unmanned aerial vehicle (UA V) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multi-constrained pursuit-evasion games (MC-PEG). The Cooperative Evasion and Formation Coverage (CEFC) task, where the UA V swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEG, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this paper, we propose a novel two-level framework (i.e., Consensus Inference-based Hierarchical Reinforcement Learning (CI-HRL)), which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multi-agent reinforcement learning module, Consensus-oriented Multi-Agent Communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an Alternative Training-based Multi-agent proximal policy optimization (A T -M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities. Nowadays, quadrotor Unmanned Aerial V ehicles (UA Vs) have demonstrated great potential in costly or human-unfriendly tasks (e.g., disaster response [1]), due to their agility, cost-effectiveness, and compact size. Nevertheless, the UA V swarm is likely to be exposed to an adversarial environment, where a hostile factor or agent might attack the affiliated members, and must respond promptly to boost the survival opportunity. Y uming Xiang and Sizhao Li and Rongpeng Li are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China (email: {xiangym1999; liszh5; lirongpeng }@zju.edu.cn).
- Asia > China > Zhejiang Province > Hangzhou (0.24)
- Europe > Austria > Vienna (0.14)
- Asia > Macao (0.04)
- (7 more...)
- Research Report > Promising Solution (0.34)
- Instructional Material > Course Syllabus & Notes (0.34)
- Information Technology (0.48)
- Aerospace & Defense > Aircraft (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs
Tang, Wenjing, He, Xinyu, Huang, Yongxi, Xiao, Yunxiao, Lu, Cewu, Cai, Panpan
Task planning under uncertainty is essential for home-service robots operating in the real world. Tasks involve ambiguous human instructions, hidden or unknown object locations, and open-vocabulary object types, leading to significant open-ended uncertainty and a boundlessly large planning space. To address these challenges, we propose Tru-POMDP, a planner that combines structured belief generation using Large Language Models (LLMs) with principled POMDP planning. Tru-POMDP introduces a hierarchical Tree of Hypotheses (TOH), which systematically queries an LLM to construct high-quality particle beliefs over possible world states and human goals. We further formulate an open-ended POMDP model that enables rigorous Bayesian belief tracking and efficient belief-space planning over these LLM-generated hypotheses. Experiments on complex object rearrangement tasks across diverse kitchen environments show that Tru-POMDP significantly outperforms state-of-the-art LLM-based and LLM-tree-search hybrid planners, achieving higher success rates with significantly better plans, stronger robustness to ambiguity and occlusion, and greater planning efficiency.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Workflow (0.68)
- Research Report (0.50)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Rapid AI-based generation of coverage paths for dispensing applications
Baeuerle, Simon, Mendonca, Ian F., Van Laerhoven, Kristof, Mikut, Ralf, Steimer, Andreas
Coverage Path Planning of Thermal Interface Materials (TIM) plays a crucial role in the design of power electronics and electronic control units. Up to now, this is done manually by experts or by using optimization approaches with a high computational effort. We propose a novel AI-based approach to generate dispense paths for TIM and similar dispensing applications. It is a drop-in replacement for optimization-based approaches. An Artificial Neural Network (ANN) receives the target cooling area as input and directly outputs the dispense path. Our proposed setup does not require labels and we show its feasibility on multiple target areas. The resulting dispense paths can be directly transferred to automated manufacturing equipment and do not exhibit air entrapments. The approach of using an ANN to predict process parameters for a desired target state in real-time could potentially be transferred to other manufacturing processes.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Siegen (0.05)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
- Asia > Singapore (0.05)
Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective
Collini, Luca, Hennessee, Andrew, Karri, Ramesh, Garg, Siddharth
Recent Large Language Models (LLMs) such as OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT). Their potential in hardware design, which relies on expert-driven iterative optimization, remains unexplored. This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization. During HLS, engineers manually define pragmas/directives to balance performance and resource constraints. We propose an LLM-based optimization agentic framework that automatically restructures code, inserts pragmas, and identifies optimal design points via feedback from HLs tools and access to integer-linear programming (ILP) solvers. Experiments compare reasoning models against conventional LLMs on benchmarks using success rate, efficiency, and design quality (area/latency) metrics, and provide the first-ever glimpse into the CoTs produced by a powerful open-source reasoning model like DeepSeek-R1.
Channel Gain Map Construction based on Subregional Learning and Prediction
Chen, Jiayi, Gao, Ruifeng, Wang, Jue, Sun, Shu, Wu, Yi
--The construction of channel gain map (CGM) is essential for realizing environment-aware wireless communications expected in 6G, for which a fundamental problem is how to predict the channel gains at unknown locations effectively by a finite number of measurements. As using a single prediction model is not effective in complex propagation environments, we propose a subregional learning-based CGM construction scheme, with which the entire map is divided into subregions via data-driven clustering, then individual models are constructed and trained for every subregion. In this way, specific propagation feature in each subregion can be better extracted with finite training data. Moreover, we propose to further improve prediction accuracy by uneven subregion sampling, as well as training data reuse around the subregion boundaries. Simulation results validate the effectiveness of the proposed scheme in CGM construction. To support the largely increased data demands and connection requirements, communication network is becoming more complex in the forthcoming 6G era [1]. This brings challenges to low-complexity network deployment optimization and transmission design [2]. Environment-aware communication provides a promising solution for this challenge, which requires communication-related environment information, also known as the channel knowledge map (CKM) [3], [4], as side information to be exploited when designing the system. In general, the CKM can be presented in terms of a site-specific database, which provides information of concerned channel parameters at given geometric locations. Depending on the particular channel information it conveys, different types of CKM have been studied, including the channel shadowing map [5], channel gain map (CGM) [6], and beam index map [7], etc. Different CKMs can be exploited for different tasks.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)